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Abstract 

We deal with Bayesian inference for Beta autoregressive processes. 
We restrict our attention to the class of conditionally linear processes. 
These processes are particularly suitable for forecasting purposes, but 
are difficult to estimate due to the constraints on the parameter space. 
We provide a full Bayesian approach to the estimation and include the 
parameter restrictions in the inference problem by a suitable specification 
of the prior distributions. Moreover in a Bayesian framework parameter 
estimation and model choice can be solved simultaneously. In particular 
we suggest a Markov-Chain Monte Carlo (MCMC) procedure based on a 
Metropolis-Hastings within Gibbs algorithm and solve the model selection 
problem following a reversible jump MCMC approach. 

AMS codes: 62M10, 91B84, 62F15. 

Keywords: Bayesian Inference, Beta Autoregressive Processes, Reversible Jump 
MCMC. 



1 Introduction 

The analysis of time series data denned on a bounded interval (such as rates or 
proportions) has been a challenging issues for many years and still represents an 



^Corresponding author: fabrizio.leisen@gmail.com. Corresponding address: 
Universidad Carlos III de Madrid, Departamcnto de Estadi'stica, Callc Madrid, 126, 28903 
Gctafc (Madrid), Spain. 



1 



open issue. For modelling data defined on a bounded interval there are at least 
two alternative approaches. Historically the main approach applies a transform 
to the data in order to map the interval to the real line and then uses standard 
time series models. Typical examples of transformations a re the additive lq g- 



ratio transformation and the Box-Cox transformation fsee lAitchinsonl Il986f ). 
One of the earlier and relevant contributions in this framework is IWailisI 1987 . 

In this paper we follow the second approach, that is based on a direct 
modelling on th e original sample space. Among the first contributions along this 



line we refer to iGrunwald et al.l |1993[ ] who suggest a multivariate state space 
model for times series dat a defin ed on the standard simplex. Another seminal 
contribution is iMcKenzid 19851 ] which introduces a new Beta autoregressive 
process for times series defined on the s tanda rd unit interval (0,1). In 



the recent years, iFerrari and Cribari-Netol [20041 ] introduce the general Beta 
regression model, showing that is more convenient to consider the data in 
the original sample space instead of using a transformation. In particular 
they use a repara metrization, which is often e mployed for the inference of 



Beta mixtures (see iRobert and Rousseau! |2002f'). that allows to iden tify the 



mean as a parameter of the distribution. iRocha and Cribari-Netol [20091 ] extend 
the Beta regression model and propose a Beta autoregressive moving average 
process which possibly includes exogenous variables in the dynamics. In 
an applied context Beta re gression models have been also recentl y used in 



Amisano and Casarinl [20071 ] for modelling dynamic correlation, in ICalabrese 



2010 



for modeling the recove ry rate as a mixe d rando m variable and in 



Billio and Casarin 2010a and Billio and Casarin l2010bll for model i ng the 



transition matrix of a Markov- switching model. In iBillio and Casarinl 2010b 



a Bayesian procedure for latent Beta autoregressive models of the first order 
with nonlinear conditional mean is presented. 

In particular, the main contribution of this paper is to propose a Bayesian 
estimation method that allows to estimate the parameters as well as the number 
of component s of Beta autoregressive model s. The parametrization used is the 



same given in IRocha and Cribari-Netol [20091 ] . We handle the autoregressive part 



of the model, while the moving average part will be object of future research. 
Without loss of generality, we focus our attention on the linear conditional mean 
process, which is appealing for forecasting purposes but is more challenging due 
to the constraints on the parameters. The extension to the nonlinear conditional 
mean case requires minor modifications of the procedure pro posed in this paper. 
Consid ering the parameter estimation, we extend the work of IBillio and Casarin 



2010b| to the case of Beta autoregressive models of order k and discuss the 
choice of priors for autoregressive coefficients, that becomes quite a challenging 
matter for a Beta autoregressive of general order. 

Moreover, in order to select the model order, we propose a Reversible 
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Jump Markov chain Monte Carlo (RJMCMC) a l gorithm for Beta autoregressive 
models, extending the works of iBrooks et al 



2008 



2003b 


and 


Ehlers and Brooks 


1 


Enciso-Mora et al. 2009 


for 



Greenl [l995^ is an extension of the 



Integer- Valued ARMA processes. 

The reversible jump algorithm (see 
Metropolis-Hasting algorithm in which the dimension of the parameter space 
can vary between iterates of the Markov Chain. From a Bayesian point of view, 
suppose to have a set of models M. = {Mi, M.2-, • • • } where the model Aik has 
a vector of parameters 8 k of dimension nk ■ The set A4 is indexed by a parameter 
k G /C. Then the joint distribution of (k, 8^) given the observed data x is 



7T((Mk) | X) OC L(X | (Mk)M(Mk)) 



(1) 



where L(x | (/c,#k)) is the product of the likelihood and the joint prior 
p((k,8k)) = p(8 k \k)p(k) is constructed from the prior distribution of 8 k under 
model A4k and the prior for the model indicator k (i.e. the prior for model Aik)- 
The RJMCMC algorithm uses the joint posterior distribution in Equation 
([1]) as the target of a Markov chain Monte Carlo sampler over the state space 
© = Ufc e ^:(/c, R nfe ) where the states of the Markov chain are of the form (/c,#k), 
and the dimension of which can vary over the state space. Accordingly, from 
the output of a single Markov chain sampler, the user is able to obtain a 
full probabilistic description of the posterior probabilities of each model having 
observed the data, x, in addition to the posterior distributions of the individual 
models. For a recent account about reversible jump algorithm see Fan and Sisson 
(2010). 

In this paper, in order to design an efficient reversibl e jump algori t hm for 



Bet a autoregressive processes we follow the strategy of IBrooks et all [2003b 



and lEhlers and Brooksl [20081 ] . We consider jump move and stochastic reverse 



jump move between spaces of different dimensions and calibrate the parameters 

o increase the acceptance rate of 
The Gaussian assumption allows 



the move (see 


Ehlers and Brooks 


Ehlers and Brooks 


2008 


to have a 



20081]). 



while in the case of the Beta processes this result is no more valid. In our 
contribution we propose to combine an algorithm for the approximation of the 
parameter posterior mode with the calibration strategy of the jump proposal. 
The mode approximation allows us to find a close form solution to the parameter 
choice of the proposal distribution. 

The outline of the paper is as follows. In Section 2 the Beta autoregressive 
process of order k (BAR(k)) is introduced with a suitable parametrization. In 
Section 3 the Bayesian inference is developed for the BAR(k) model under the 
choice of some priors. In Section 4 the RJ algorithm is developed for the 



3 



BAR processes, in particular we studied two strategies. In Section 5 some 
simulation results are shown and in Section 6 the new machinery is applied 
to an unemployment rate and capacity utilization datasets. 



2 The Beta AR(k) model 

Let us define a Beta autoregressive process {x t }t>o of the order k as follows 

x t \T? ~ Be{r] lt ,r] 2t ) (2) 

where the = <j({x s } s < t ) is the u-algebra generated by the process, Be(r] u , r] 2 t) 
denotes the type I Beta distribution and r] U > and r] 2 t > are the two 
parameters of the distribution, usually referred as shape parameters. The two 
parameters are J-" r measurable functions and depend possibly on the past values 
of the process. The process has the following transition density 

f(x t \x t . u x t . k ) = — ^ zxT~\l - x t )^-% tl) (x t ) (3) 

where B(a, b) is the Beta function with a, b > and k is the order of the process. 
In the following we will denote with BAR(/c) the fc-order Beta autoregressive 
process and assume k < fc max , with & max < oo the maximum order of the process. 

We considered the Beta distribution because it is a fairly flexible distribution. 
It is unimodal, uniantimodal, increasing, decreasi ng or constant depend i ng; on 



the values of its parameters. We refer the reader to lGupta and Nadaraiah 



2004 



for a review on the families of Beta distributions and to iKotz and van Dorp 



2004| for a review on the finite range distributions. We consider the Beta 
of the first type, because the first and second moments of this distribution 
have a simple expression, which makes it easy to deal with the parameter 
identification issues. In fact, as suggested by many parts in the literature, 
the parameters identification should be accurately discussed. For inference and 
model interpretation purposes it is desiderable that the two parameters r]u and 
r]2t of the Beta distribution are not jointly determining both the shape and the 
moments of the distribution. 

In this work, in order to have an exact identification of the parameters 
of the conditional mean of the process and of the shape parameter, we 
co nsider the parametrization o f the Beta distribution suggested for example 



in 



Robert and Roussea u 2002 J in the con t ext o f Bay esian inference for Beta 



mixtures and in Ferrari and Cribari-Neto 20041 and Rocha and Cribari-Neto 



20091 ] in the context of maximum likelihood inference for general beta regression 
models. 
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In the chosen parametrization the conditional distribution of the process at 
time t is 

x t \Tt ~ Be(r)t4> t , (1 - r) t )4>t) (4) 

where <j>± and r\ t represent the p recision and location parameters respectively (see 
Rocha and Cribari-Netd 2009]). In this parametrization the conditional mean of 
the process coincides with the location parameter and the variance is a quadratic 
function of the location parameter as well 



E(x t | F t -i) = Vt, V(x t |7i_i) = rj t {l - Vt )/(1 + t 



(5) 



The process naturally exhibit heteroscedasticity, thus, in the following, we 
assume that the precision parameter is constant over time, i.e. 0t = <f>- 

We define a = («o, «i, . . . , oik)', X-s-.t = (x s , . . . , x t )', with s < t and 
z t = (1, xj_ 1 . i _ fc ) / . We assume that rjt = a'z t is a linear combination of a 
constant and of the k past values of the process. We will assume that at G A^+i 
where A fc+1 = {a. G (0, l) fc+1 | ^*L a« G (0,1)} and refer to it as convexity 
constraints. 

In Fig. [T] there are some sample paths of a Beta process of the third order. 
The paths are given for different values of the precision parameter <fi and of 
the constant term ckq. F° r higher values of (f> the process exhibits less volatility 
(upper- left chart). The larger the value of the constant term the greater is 
stationary mean of the process (upper- right chart). Finally we observe that the 
conditional mean of the process rj t G [rj, fj], where 77 = a and fj = ao + . . . + aj.. 
When 7] < 1/0 and fj > ((f) — 1) /(f), then (prjt < 1 and 0(1 — rjt) < 1 and 
the transition density of the process is anti-unimodal. In this case the process 
exhibits a switching-type behavior (see the left and right charts at the bottom 
of Fig. EEJ. 

This process extends to t h e ord er k the first-order Beta process proposed 

and then discussed in a latent variable 
The Beta process considered in 



111 



Nieto-Baraias a nd W alker 



2002 



framework by lAmisano and Casarinl [2007 



our work represents a special case of the beta regression model proposed in 
Ferrari and Cribari-Netol 20041 and of the beta ARMA process introduced by 
Rocha and Cribari-Netol [2009J . We focus on the linear case because it is 
particularly suitable for forecasting purposes and is more difficult to estimate 
than the nonlinear case due to the constraints on the parameters, which are 
necessary in order the process to be defined. In particular we show how to 
deal with the first-order stationarity constraints in the inference procedure. The 
Bayesian inference approach presented in the following apply in a straightforward 
way to the beta ARMA pro cess with non-linear conditional mean given in 
Rocha and Cribari-Netol 120091. 
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Figure 1: Simulated trajectories of a BAR(3) process for different parameter 
settings. Up-left: the effect of the precision parameter G {20, 100} for 
a = (0.17,0.03,0.1,0.60). Up-right: effect of the constant term chq for <fi = 100 
and (cti, Q!2, 0:3) = (0.03,0.1,0.60). B ottom- left and bottom-right: anti-unimodal 
transition distribution and switching-type trajectories of the BAR(3) process for 
different values of <p (<p G {0.1,0.9}) and with a = (0.46,0.03,0.01,0.30). Both 
of the cases correspond to rj = 0.46 < 1/0 and f] = 0.8 > (<p — l)/<fr. 

3 Bayesian inference 

The likelihood function of the model is 

T 

C(a, 0|x, o:T ) = J] BM, (1 - vt^xf^il - xt)*-**- 1 (6) 

t=to 

where x to:T = (x to , . . . , xt)' and t = /c max + 1. Note that we consider an 
approximated likelihood because for a Beta process of the order k < /c max we 
assume the observations start in t = fc max + 1 and thus forget the first (fc max — k) 
observations on Xt- Moreover in the following we will assume that the first 
fc max initial values of the process are known. It is possible to include the initial 
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values in the inferen ce process following, for example, the approach given in 
Vermaak et al.l 2004| for the Gaussian autoregressive processes. 



3.1 The priors 

The constant term and the coefficients of a BAR(k) belong to the set Ak+i, thus 
the usual assumption of Gaussian prior distribution has to be modified. In the 
following we consider some alternative prior specifications. 

We first considered a multivariate normal a ~ N'k+i {y, T) with mean v and 
variance T, truncated to the set A^+i. In the following we will denote with 

/(a) oc exp |-I(a - u)'T-\ot - i/)} I Afc+1 (a) (7) 

the density function of the prior on a., where Ia(%) is the indicator function. 
This prior distribution is given, for the case k — 1, in the upper- left chart of 
Fig. [2j In the example we assumed u = (l/(k + 2), l/(k + 2))' and T = 0.1I 2 , 
where I n is the n-dimensional identity matrix. Using this prior it is not easy 
to have a uniform prior and at the same time to guarantee the positivity of the 
parameters of the Beta process transition density, i.e. r] t <p > and (1 — r] t )4> > 
VI The truncation on the simplex of the normal prior distribution can not 
guarantee that the inference procedure returns parameter values which satisfy 
at the desired constraints, and can generate numerical problems in the Monte 
Carlo procedures employed in the inference process. 

In order to prevent th e posterior to take va l ues n ear the boundaries of the 



parameter space we follow iRobert and Rousseau! [20021 ] and introduce a repulsive 



factor around the boundaries of the standard simplex defined in the previous 
section. We observe that rj < rjt <fj and propose the following prior distribution 
conditional on <fi 

f(a) oc exp |-^(« - vyr-^a - i/) J exp j-^-^— — J I Afc+1 (a) (8) 

where k is an hyperparameter. In the upper-right and bottom-left charts of Fig. 
[2] we show, for the bivariate case, the shape of this prior distribution conditional 
on = 10 for two values of the hyperparameters (k = 5 and k = 10). The 
multiplicative factor creates low density regions near the boundaries for all t. 
In both the simulation experiments and the real data applications considered 
in Section El this kind of prior contributes considerably to avoid the numerical 
problems in the evaluation of the posterior density and of its gradient and Hessian 
which are needed for the Monte Carlo based inference procedures. We note the 
solution of the numerical problems in the evaluation of the posterior distribution 
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Figure 2: Prior distributions for a in the simplex A^+i for k = 1. In each graph, 
the level set (gray areas) and the sum-to-one constrain (solid line). Truncated 
normal prior (up-left) with parameters values v = (l/(k + 2), l/(k + 2))' and 
T = 0.1/2- Modified normal prior conditional on (f) = 10, with parameters 
v = (l/(k + 2), I/O + 2))' and T = 0.1/ 2 for k = 5 (up-right) and k = 10 
(bottom-left). Multivariate Beta-type prior (bottom-right) with parameters 
values u = (k+l,k + l)' and 7 = (k + 2, k + 2)'. 



is due to the fact that the prior distribution behaves like a penalizing term for 
the likelihood function and the penalization allows the algorithm to account for 
the constraints on the parameters. As an alternative, we consider a multivariate 
distribution which is naturally defined on A^+i and which has density function 




where Ai = 1 — Y^j=o a i an d the parameters v = (v , . . . v^)' G M^L +1 and 

7 = (7o,...,7*)'eMt +1 - 

This distribution has been obtained by considering a set of independent Beta 
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random variables u$ ~ -Be(z/j, 7^), with z = 0, . . . ,/c, and then the multivariate 
transformation 

j'-i 

a i = w i II^ 1 ~~ u *)> forj = 0, (10) 

i=0 

This stochastic representation constitutes a natural way to generate random 
numbers from the multivariate distribution defined above. It is easy to show that 
thanks to this representation the constraints at G A^+i on the parameters of the 
Beta process are satisfied, in particular Yli=o a « = 1 — 11^=0 ~~ Vi ) — Bottom- 
right chart of Fig. [2]shows this prior for the parameter setting v = (k-\-l,k + l)' 
and 7 = (k + 2, k + 2)'. It can be seen from the picture that this prior gives a 
low probability mass to the values of a near the boundaries 77 = and fj = 1. 
The precision parameter <fi is positive, thus we assume a gamma prior 

0~£a(c,d) (11) 

with parameters c and d and denote with /(</>) the associated density function. 



3.2 A MCMC algorithm 



In order to approximate the posterior mean for the parameters of the BAR(/c) 
process we consider a Gibbs algorithm. The full conditional distributions can 
not be si mulated exactly and are si mulated by a Metropolis-Hastings (M.-H.) 
step (see Chib and Greenberg 19951 ] ) . The full conditional distribution and the 
associated simulation method are described in the following. 
The full conditional distribution of a. is 



7r a 



>, x to;T ) cx exp - BW, (1 - Vt)4>) + A ^ ( 12 ) 



t=t 



t=t 



where A t = log(xt/(l — xt)). To simulate from this distribution we employ a 
M.-H. algorithm with a proposal distribution which makes use of the inform ation 
on the local structure of the posterior surface (see lAlbert and Chibl [19931 ] and 
Lenk and DeSarbol |2000 ]). Consider the second-order Taylor expansion of the 
log-posterior, g(ct), centered around ay\ 



g{a) w g{a. 



(a-a (j) )V (1) ^(a (j) ) + -(a-a (j ' ) )'V (2) ( ? (a (j) )(a-a w ) (13) 



where cxS^ represents the approximated mode of the posterior, then at the j-th 
iteration of the M.-H. step we generate a candidate as follows 



(14) 
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where £( J ' -1 ) = — (V^g^d:^' -1 ))) . The acceptance rate of this M.-H. step 
can be easily found. Due to its relevance in the model selection procedure, the 
approximation method for the posterior mode will be discussed in Section HI 
The full conditional distribution of d> is 



7T 



a,x to:T ) oc exp log 5(^0, (1 - rj t )(j)) + <p{A t r] t + log(l - x t ))j 1/(0) 

V t=t J 

(15) 

We simulate from the full conditional by a M.-H. step. We consider a Gamma 
random walk proposal and at the j-th step of the algorithm, given the previous 
value (jfi~ x > of the chain, we simulate 

W ~ ga{a{^-^) 2 , a^ j - 1] ) (16) 

and accept with probability 

. f 7r(0W|a,x fo:r ) r^^- 1 )) 2 )^- 1 ))^) 2 - 1 ^)^^ 2 1 (]7) 



4 Model Selection 



In this work we propose a Reversible- Jump MC MC (RJMCMC) appr oach 



to model selection for Beta autoregressive. See iBrooks et al.l [2003bl | for 
an introduction to effic ient proposal design in RJMCMC algorithms and 
Ehlers and Brooksl 20081 ] for an application of so me efficient adap t ive pr oposals 



to Gaussian autoregressive processes. See also IVermaak et al.l [20041 ] for an 
alternative RJMCMC schemes for Gaussian autoregressive. To the best of our 
knowledge only a few studies exist on the application of RJMCMC algorithms 
to non-Ga ussian autoregressive mod els. Among the other we refer the interested 



reader to lEnciso-Mora et al.l [20091 ] who proposed a RJMCMC algorithm for 



model selection in integer valued ARMA models. 



2008 



In this work we extend the RJMCMC approach of lEhlers and Brooks 
to the case of the Beta autoregressive proces ses of a unknown order. We con sider 
the definition of Beta autoregressive given in lRocha and Cribari-Netol 2009| and 
restrict our attention to the case of conditionally linear processes because the 
inference in this context is more difficult due to the restriction on the parameters. 
Nevertheless the proposed RJMCMC algorithm can be easily extended to the 
case of a Beta process with a nonlinear conditional mean. 

In the following we will show how to deal, in a Beta autoregressive context, 
with some special constraints which can be imposed in the inference process. 
First we show how to deal with first order stationarity constraints. We suggest to 
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move between BAR(/c) and BAR(/c + 1) processes and to consider the innovation 
process naturally associa ted to a BAR(&;) to design this move. As suggested in 
Ehlers and Brooks! |2008j we choose the parameters of the proposal for jumps 



between BAR(fc + 1) and BAR(fc) in such a way to make the acceptance rate 
close to one. 

Secondly we consider convexity constraints on the parameters of the 
conditional means and suggest to use a trans-dimensional MCMC chain which 
moves between spaces of any finite dimensions k and k' by employing a suitable 
proposal distribution. In order to make the acceptance rate close to one we 
suggest to combine the second-order proposal strategy with a Newton-Rapson 
type sequential approximation of the posterior mode. 

4.1 Stationarity Constraints 

Let us define the innovation process ft = x t — r] t , then 

ft = (1 - a x L - ... - a k L k )x t - a (18) 

where L is the lag operator. The innovation process can be written in terms of 
reciprocal roots 

k 

ft = - X i L ) x t - «o (19) 
i=i 

and it is stationary in mean if |Aj| < 1. 

Let ft be the innovation term associated to BAR(fc) and f * the one associated 
to a BAR(/c + 1) then 

k 

C t = (1 - rL) JI(1 - \jQxt - a* 
i=i 

= (l-rL)(ft + a )-Q!o 

= (l-rL)ft (20) 

if «o = «o- 

Thanks to this representation of the location parameter, the likelihood 
function of the BAR(/c + 1) process will write as 



£(a o ,A,r,0|xt o:T ) = J[ B (rgh (1 - rjD^xf^il - xtf 1 ^*- 1 

t=t 

T 

I B(g t (r)<p, (1 - gtirM-'xf (r) ^(l - xtf^^ 1 (21) 



t=t 
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where g t (r) = x t — — r£t_i), A (A 1; . . . , A&)' is the vector of reciprocal roots 
of the BAR(/c) process which coincide with the first k reciprocal roots of the 
BAR(& + 1) and r G (-1, 1) is the k + 1-th reciprocal root of the BAR(/c + 1) 
process. 

We assume independent Beta priors for the reciprocal roots of the process 
and in order to assure that the constraints on the autoregressive coefficients are 
satisfied (see Appendix A), we let Uj ~ Be(a,b) and define Xj = 2uj — 1 for 
j = 1, . . . , k + 1, with Afc+i = r. We denote with /(A) and /(r) the densities 
associated to these priors. 

In order to design a reversible jump algorithm, which allows for jumps 
between posterior distributions with different dimensions, we need a model prior, 
a proposal distribution for the model order and for the new parameters and 
finally a link function between the parameter spaces of the two models. In 
the following we will assume that the model prior associated to k is uniform: 
k ~ ^{i,...,fe max }) where /c max is the maximum order of the process and denote with 
f{k) the associated density function. As a proposal Pk,k+i for the order of the 
new model we consider a Bernoulli distribution with probability 1/2. We assume 
a truncated normal distribution as a proposal for the new root, i.e. r ~ A/"(/i, a 2 ) 
truncated to (—1,1) and denote with q{r) its density. Finally, as suggested in 



Ehlers and Brooks! [20081 ] . we consider an identity dimension matching function 
/(A, Afc+0 = 

The acceptance ratio of the move from a fc-order autoregressive to an 
autoregressive of order (k + 1) is 

A £{ao,\r,<f)\xt o: T)f{ao)fWf{<P)f{r)f{k + l)p k+hk 1 

k ' k+1 ~ £(a o ,A,0|x to:T )/( ao )/(A)/(0)/(% fc , fe+1 q(r) 1 1 1 } 

where \ J\ is the Jacobian of the transformation. Under the above assumptions 
the acceptance rate simplifies to 

a _ £(tt(hA,r,0|x to:T )/(r) 1 

Ak > k+1 ~ £KA,0|x to:T ) g(r) {Z6) 



As suggested in lEhlers and Brooksl |2008| the likelihood of the BAR(fc) and 



the BAR(/c + 1) models are identical at r = 0, which is a weak non-identifiability 
centring point. We use the centring point to find the parameters of the second- 
order proposal. To this aim we consider the first and second order derivatives of 
the log-acceptance ratio. 

The log-acceptance ratio of the move from a Beta process of order k to one 
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of order k + 1 is 

log A kM1 = C + ^(r - Ai ) 2 - (log B(g t (r)<t>, (1 - g t {r))<f>)) 

t 

T 

+ ((9t(r)<f> ~ 1) MX*)) + E (((1 " 9t(r))<f> - 1) log(l - x t )) 

t=t t 

+{a - 1) log(l + r) + (b - 1) log(l - r) (24) 

where C is the log of a normalizing constant which does not depend on r. The 
first- and second-order derivatives of the log-acceptance ratio are 

T 

d r \ogA Kk+l = J2 6-10 (* (0) ((1 - 9t(r)m - ¥°\g t (r)<p)) 

t=to 

+ V (log(x t ) - log(l - + + — — - (25) 

*— ' <T 1 + r 1 — r 

t=t 

T 

9 rr logA M+1 = -^^ 2 -i0 2 (^ (1) (^(r)0) + -9t(r))<f>)) 

t=t 

1 a- 1 6-1 



a 2 (1+r) 2 (l-r) ! 



(26) 



where \E f( - ^ and ^P 1 - 1 ) are the digamma and trigamma functions respectively. 

We evaluate the first- and second-order derivatives at r = 0, solve for \i and 
a 2 and find the parameters of the proposal 

((a-fr)+<K^ + tA)) m 

» ~ ( a+6 )_ 2 + 02 [ / 3 

^ = (a + b) -2 + 0^3 (28) 

where 



= ^6-i(^ (o) ((l-^ + 6)0)-* (O) ((^-6)0)) 
t=t 

T 

u 2 = ^6-iiog(^/(i-^)) 

t=to 
T 

^ = ^et-i(* (1) ((^-6)0)+* (1) ((i-^+6)0)) 



t=t 
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4.2 Convexity Constraints 

In order to deal with the convexity constraints on the parameters in the 
conditional mean of the BAR process we suggest to use RJMCMC moves between 
spaces with any positive dimensions difference, and to control for the constraints 
by a suitable choice of the proposal distribution for the parameters. 

Given the parameter vector ol G A k+1 , the RJMCMC proposes a jump to the 
space Afc/ + i by generating k! from a discrete distribution p^^i and then proposing 
a /c'-dimensional parameter vector u G Afc/+i from the distribution q(u), which 
we assumed to be defined on the standard simplex. For the dimensions matching 
we consider the transform (a', u') = (u, a), which has unitary Jacobian. The 
acceptance probability of the proposal is 

£(u,0|x to:T )/(u)/(0)p fc /p fe / ifc q(a) 



C(a, (j)\x t0 :T)f(Oi)f((j))pk>Pk,k> ?(u) 

As proposal we consider a parametric family of distributions and choose the 
parameters in such a way that the log-acceptance rate is approximately equal to 
zero. 

We consider a second-order methods and focus on the gradient and the 
Hessian of the log-acceptance rate with respect to (u, a). Note that the gradient 
naturally splits in the two subvectors, that are the gradient with respect to u 
and the one with respect to a. The cross derivatives in the Hessian are null. In 
the following we consider the case for u, the case for a. being similar. 

We choose the parameters of the proposal such that 



V (1) logA fe , fc , = £(A t -tf(°>M) + ¥< o >((l-»/O0))^ 
t=t 

+V (1) log /(u) - V (1) log q(u) = 

T 

V (2) logA fc , fc , = -Y,^ {l \vt<P) + ^ {1 \(l-Vt)^))<P 2 *t< 

t=to 

+V (2) log/(u) - V (2) logg(u) = 

where and have been defined in the previous section. 

It sho uld be noticed that as opp osite to the Gaussian autoregressive model 



studied in lEhlers and Brooks! [2008j , the gradient and the Hessian depend on u. 
In this work we decide to evaluate the derivatives at the approximated posterior 
mode u, which is defined on the fc'-dim space. 

As an example let use consider a Gaussian distribution as proposal with 
mean fi k , and variance Assume that the prior is the Gaussian distribution 
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truncated on the simplex, then we solve the above system of equations (see 
Appendix B for the gradient and the Hessian in the Gaussian case) with respect 
to \x k i and E^/ and find 

T 
t=t 

/% = u + E fe , ( T-/(u - u v ) -^2 (A- * (0) M) + ^ (0) ((1 " VtW) fat) 

\ t=t J 

By applying the same procedure for the derivatives with respect to a we obtain 
a full modal centering procedure, which promotes jumps between modes in the 
k- dimensional and fc'-dimensional spaces. 

We do not know the value of the posterior mode and in order to find an 
approximation we suggest to use a Newton-Rapson procedure. More specifically 
at the iteration j of the RJMCMC procedure the approximated mode is 
determined by the following recursion 

a (i) = uO-i) _ ^-iJvW^uO - - 1 )) (29) 

where E^" -1 ) = — (V^ 2 ^(u^ _1 - ) )) . The gradient and the Hessian of the log- 
posterior, V 1 - 1 ^ and V^g respectively, are 

T 

v«s(u) = "^2 (A t - v^M) + - nt)*)) fat + v« io g /(u) 

t=t 

T 

V^(u) = -J2 + * (1) ((1 - Vt)<t>)) <p 2 zt< + V® log/(u) 

t=t 

In Appendix B we provide the analytical expressions for the gradient, 
log /(u), and the Hessian, V*- 2 ^ log /(u), of the log-prior, for the different 
choices of the prior discussed in Section After an initial learning period the 
posterior mode in the various spaces of different dimension is reached with a 
certain tolerance value and the log-acceptance rate of the jump move is close to 
zero as effects of the choice of the proposal parameters. 

Finally it should be noticed that the approximation u^') of the posterior mode 
in the jump step is also used in the M.-H. steps within the Gibbs sampler for 
simulating the parameters, thus no further computational burden is required for 
this approximation. 
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5 Simulation Results 



In this section we study, through some simulation experiments, the efficiency of 
the proposals for both within and between models moves. In order to check if the 
chain mix well we estimate the acceptance rate of the M.H. steps, the effective 
sample size and the convergence diagnostic statistics. 

In a first set of experiments we verify the efficiency, in terms of mixing, of 
the proposals of the Metropolis-Hastings step of the moves for a given dimension 
and for the different choices of the prior distribution and parameter settings. 

In each experiment we proceed as follows. For each model order k and 
parameter setting 6' = (ck',0), we generate 50 independent random samples 
with size n = 300 from a BAR(k). On each dataset we iterate the MCMC 
algorithm, defined in the previous sections, for N = 10000 times. We consider 
dataset of n = 300 observations because this is the typical sample size of the 
applications to real data we have in mind and that will be presented in the next 
section. This allows us to estimate the magnitude of the parameter estimation 
errors (root mean square error) associated to the inference procedure for the 
chosen sample size. 

As regard to the values of the parameters we consider two scenarios. In the 
first one we set the precision parameter = 20, which corresponds, as highlighted 
in Fig. [JJ to the case of high variability in the data. We will call low precision 
data all the dataset simulated with this value of the precision parameter. We 
expect that the parameter estimation is more challenging in this context. 

In the second scenario, which is more frequent in the applications we will 
consider in the next section, we set the precision parameter <ft = 100 that is a 
higher value than the ones considered in the first scenario. In this case the data 
exhibits less variability (see Fig. [T]). We will call high precision data all the 
dataset simulated with this precision value. 

For each scenario we consider different values for the parameters and different 
autoregressive orders. The resulting parameter settings, which have been used 
in the MCMC experiments, are given in Tab. [TJ We also evaluate the 
efficiency of the MCMC algorithm for different choice of the prior distribution 
and consider the truncated normal distribution with parameters v — {k + 2)~ 1 t 
and T = 100/ and the modified truncated normal distribution with parameters 
v = (k + 2)~ 1 i, T = 100/ and k — 10 and the Beta-type distribution with 
parameters v = (k + 1, . . . , k + 1) and 7 = (k + 2, . . . , k + 2). 

Fig. El shows typical raw output and progressive averages of N = 10000 
iterations of the MCMC chain for a (left chart) and <fi (right chart). In these 
figures, the initial value of the Gibbs sampler and the burn-in sample are 
included in order to show the convergence of the MCMC progressive averages to 
the true values of the parameters. In this example, we considered a sample 
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k 


0' = (a',0) 


0' = («',</>) 


1 


(0.32,0.5,20) 


(0.32,0.5,100) 


2 


(0.32,0.5,0.1,20) 


(0.32,0.5,0.1,100) 


3 


(0.32,0.5,0.1,0.03,20) 


(0.32,0.5,0.1,0.03,100) 


4 


(0.32,0.4,0.1,0.03,0.1,20) 


(0.32,0.4,0.1,0.03,0.1,100) 



Table 1: The parameter settings employed in the MCMC experiments. First 
column: the autoregressive order. Second and third columns: conditional mean 
parameters for low precision data (i.e. <p = 20) and high precision data (i.e. 
(f) = 100). 



of n — 300 observations simulated from a BAR(3) model with parameters 
(0.32,0.5,0.1,0.03)' and <p = 20. 



a 



For each set of 50 independent MCMC experiments of length N we estimate 
the RMSE, the average acceptance probability (ACC) of the M.-H. within Gibbs 
steps and the following quantity 



ESS = N l + ^corr(0(°\0W) 



(30) 



t=\ 



that is the effective sample size (ESS) of the MCMC sample for a parameter 9. 
In order to evaluate the RMSE and the other statics we consider N = 10000 
MCMC iterations and discard the output of the first 1000 iterations. 

In the various experiments we will also evaluate the number of MCMC 
iterations which are necessary to our MCMC algorithm to reach convergence. It 
should be noticed that the choice of the number of MCMC iterations is still 



an open issue see 



1998 



Robert and Casellal 20041 and iGuihenneuc-Jouyaux et al. 



In this work we combine a graphical inspection of the progressive 
averages of the parameter posterior distribution with the evaluation of 
a convergence crit e ria b ased on the Kolmogo rov-Smirnov ( KS) t est (see 
Robert and Casellal (2004| . Ch. 12). See also brooks et all |2003a| for an 
alternative use of the KS statistics for detecting convergence in RJMCMC chains. 
In the simulation experiments, for each component 9 of the parameter vector 6, 
we split the associated MCMC sample 9^\ j — 1, . . . , N in two subsamples 9± 
1 M and evaluate 



and 9 ^ with j 



KS = —sup 

M „ 



M 



M 



(0,ij)» 



(0,r?)( 



9 { 2 j ° h 



(31) 



where G is the batch size. The use of batches is necessary in order to obtain quasi- 
independent samples. The independence of the samples is one of the assumption 
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<=> 5000 10000 °^ SCJOO 10000 

MCMC Iter. MCMC Iter. 

Figure 3: MCMC raw output (gray solid lines) and progressive averages 
(dashed lines) for a (left chart) and <fi (right chart) estimated on a dataset 
of n = 300 observations simulated from a BAR(3) process with parameters 
a = (0.32,0.5,0.1,0.03)' and <fi = 20. The initial value of the Gibbs sampler 
and the burn-in sample are included in the MCMC sample in order to show 
the convergence of the MCMC progressive averages to the true values of the 
parameters. 

to have a known limiting distribution for the KS statistics. For each experiment 
we will show the average p- value of the KS statistics over the vector of parameters 
and over the last 100 iterations of the MCMC chain. 

We summarize the results of the MCMC experiments in Tab. [2j The results 
for the Beta-type prior and for the modified truncated- Gaussian are similar, 
thus the choose to present the results for modified truncated-Gaussian prior. 
From Tab. [2] we note that the RMSE of the autoregressive coefficients and of 
the (f) parameter tends to grow with the model order. This suggests that the 
higher the number of parameters to be estimated the lower the precision of the 
estimates. Besides, the precision of the <fi estimate is influenced by the number 
of parameters, as it decreases as the model order increases. 

Moreover, the RMSEs of the high precision data results are generally lower 
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Low precision data 


k 


Estimated RMSE 

ceo a i a 2 Q^3 a>i <fi 


ACC 


ESS 


KS 




Truncated- Gaussian Prior (u — (k + 2) t, T = 1007) 


1 


0.032 0.058 0.376 


0.176 


704 


0.534 


2 


0.033 0.043 0.023 0.996 


0.172 


630 


0.552 


3 


0.087 0.094 0.026 0.051 2.092 


0.163 


584 


0.523 


4 


0.041 0.011 0.019 0.075 0.032 3.727 


0.155 


538 


0.541 




Modified Truncated-Gaussian Prior (y — (k + 2) 1 t, T = 1007, k = 10) 


1 


0.033 0.051 0.392 


0.181 


1013 


0.556 


2 


0.032 0.055 0.021 0.916 


0.183 


853 


0.563 


3 


0.015 0.077 0.018 0.059 1.701 


0.192 


783 


0.574 


4 


0.030 0.023 0.013 0.059 0.034 2.564 


0.189 


740 


0.539 




High precision data 


k 


Estimated RMSE 

a$ cti «2 «3 a?4 


ACC 


ESS 


KS 




Truncated- Gaussian Prior (y — (k + 2) 1 t, T = 100/) 


1 


0.011 0.018 0.964 


0.392 


923 


0.513 


2 


0.029 0.047 0.031 1.815 


0.402 


827 


0.546 


3 


0.038 0.071 0.032 0.002 3.122 


0.422 


798 


0.593 


4 


0.021 0.038 0.037 0.007 0.029 6.430 


0.538 


778 


0.511 




Modified Truncated- Gaussian Prior [y = (k + 2) 1 i, T = 100J, n = 10) 


1 


0.017 0.021 0.392 


0.403 


1198 


0.511 


2 


0.020 0.028 0.001 1.101 


0.420 


1012 


0.534 


3 


0.031 0.063 0.003 0.002 1.539 


0.428 


941 


0.529 


4 


0.029 0.033 0.033 0.001 0.018 3.955 


0.509 


830 


0.542 



Table 2: Estimation results for different model orders, parameter settings and 
prior distributions. The results are averages over a set of 50 independent MCMC 
experiments on 50 independent dataset of n = 300 observations. On each dataset 
we run the proposed MCMC algorithm for N = 10000 iterations and then discard 
the first 1000 iterations in order to estimate the parameters. In each panel: 
model order (first column); estimated root mean square error (RMSE) (second 
column) for each parameter; average acceptance rate (ACC) and the effective 
sample size (ESS) (third and fourth columns) averaged over the two M.-H. chain 
in the Gibbs sampler; KS convergence diagnostic statistics (last column) with 
batch size G = 50 (average over the last 100 iterations). 
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than RMSEs of low precision data, with the exception of the <f> parameter that 
seems to behave differently from the autoregressive coefficients. However, in 
order to correctly evaluate the outcomes, we need to compare the RMSE to the 
value of the <fi parameter. For example, for the Truncated-Gaussian prior the 
estimated RMSE of <fi for the low precision BAR(4) is equal to 3.727 and the 
corresponding high precision RMSE is equal to 6.430. However, dividing this 
two values by the related <fi value (20 and 100 respectively) we get percentages 
of 18.635 and 6.43, showing a better performance for high precision results, as 
expected. 

The third column of Tab. |2] lists the average acceptance rate of the 
parameters. As it is clear from the table, data with higher precision show a 
noticeably improvement in terms of ACC compared to data with lower precision. 
However, we need to point out that Tab. 2 displays the average acceptance rate 
and the improvement is due mainly to the higher ACC value of the parameter, 
since the ACC of the autoregressive coefficients is very similar for both type of 
data. 

The ESS is the number of effectively independent draws from the posterior 
distribution and it is a measure of the mixing of the Markov chain. In our case, 
this measure decreases and denotes a worse mixing as the order of the process 
increases. Moreover, as we see in the fourth column of Tab. 2, ESS values are 
higher in high precision data, showing a better mixing than in low precision 
data. 

The average p-values of the KS statistic, in the last column of Tab. U\ take 
values close to 0.5 for the different precision of the data and the different model 
orders, suggesting the acceptance of the null hypothesis that the subsamples 
associated to the Markov chain have the same distribution, guaranteeing 
convergence. Besides, since Tab. 2 displays only the average p-values, we 
underline that over the last 100 iterations of the Markov chains the KS p-values 
improved, getting closer to 1. 

Comparing now the outcomes related to the priors used, the Modified 
Truncated-Gaussian prior performs better than the Truncated- Gaussian prior 
for both the low and high precision data. As we notice from Tab. [2, the 
RMSEs of the autoregressive coefficients are quite similar, but the RMSE of 
the 4> parameter calculated with the modified prior is sensibly lower. Moreover, 
the ACC and the ESS values of the modified prior are slightly higher, denoting 
a better mixing of the Markov chains. The p-values of the KS statistic indicate 
convergence to the target distribution, being close to the value of 0.5. 

Fig. H] gives a more detailed description of the behavior of the RMSE and 
of the ESS in the MCMC experiments. Fig. H] illustrates the parameters RMSE 
on the x-axis and the ESS of the two M.H. steps on the y-axis, for the 50 
MCMC simulation experiments. Large circles denote the truncated-Gaussian 



20 



0=20 




=100 



o 
o 

CM 



O 
O 
O 



w o 
w o 

LU 00 



O 
O 
CD 



O 
O 



• * * ..• • . s * * • 

• • " • 

• v v/ • * 




O Qo 
o 



oo' 
o 




P o 
oo 



o 
o' 



oo 




• BAR(1) 

• BAR(2) 
BAR(3) 
BAR(4) 



0.0 



0.2 



0.4 



0.6 
RMSE 



0.8 



1.0 



1.2 



Figure 4: ESS and RMSE of the 50 MCMC simulation experiments (averages 
over the parameter vector) for different autoregressive order of the BAR(k) 
(different gray levels of the circles) and different choice of the prior (bigger circles 
for the truncated Gaussian and smaller circles for the modified Gaussian prior). 
In the rows the ESS and the RMSE statistics for two different values of 0. 
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prior results, while small circles denote the modified truncated- Gaussian prior 
results. The different colors of the circles indicate the different order of the 
BAR(k). The top plot displays low precision results (0 = 20) and the bottom 
plot displays high precision results (0 = 100). 

For both high and low precision data, the RMSE grows with the order of 
the model, while the ESS decreases. Moreover, the modified prior gives more 
efficient estimates, since the small circles in Fig. 2] show an improvement as in 
the RMSE as in the ESS. 

The RMSE calculated with low precision data (0 = 20) is lower that the 
RMSE calculated with high precision data (0 = 100), but it is due to the fact 
that the circles depicted in Fig. H]are averages over all the parameters, including 
the parameter 0. Therefore, as illustrated above, in order to compare the outputs 
it is necessary to consider the different value of the parameters. 

Furthermore, we note that for the different model orders high precision results 
appear to be more scattered, while low precision outcomes appear to be more 
concentrated. Since this behavior is associated to a higher ESS value, this suggest 
that the Markov chains explore more freely the parameter space and that they 
are characterized by a better mixing. 

The aim of the second set of experiments is to study the relation between 
the size of the sample of observations and the model posterior probability. We 
consider a dataset of 500 observations simulated from a BAR(k) with k — 3 
and parameter values (a', 0) = (0.37,0.4,0.1,0.03,100). Then in order to 
estimate both the parameters and the autoregressive order we assume a modified 
truncated-Gaussian prior with parameters u = (k + 2)~ 1 l, T = 100/ and apply 
the RJMCMC algorithm presented in the previous section to subsamples of 
different size (from 100 to 500 observations). 

A typical output of a RJMCMC chain for a dataset of 300 observations is 
given in Fig. [5J Each chart shows the RJMCMC iterations for each subspace. 
The iterations of the chain for the precision parameter are given in the upper 
chart of Fig. |6j In the bottom chart of the same figure we show the model 
posterior probabilities. 

The results of the model selection procedure for the different sample size 
are given in Tab. [33 All the estimates are based on 100,000 iterations of the 
RJMCMC chain. In the specific example at hand, the last three rows in the table 
show two interesting results. First the estimated model order is not correct 
(k = 2 with k — 3) for a sample of 100 observations. Secondly the standard 
deviation is 3.1253 for n = 100 and decreases for increasing n, which means that 
the model posterior distribution is less concentrated around the mode for the 
samples of smaller size. 

It should be noticed that, while small-sample bias in the model order 
estimates has also been observed in RJMCMC-based model selection procedures 



22 



CD 
O 




O 

° (5 550 T555 1550 



CD 




O 

d (5 550 T555 1555 



CD 
O 










i 




ffl 











° (5 555 T555 1555 2000 



CD 

d 




° (5 55 T55 150 



Figure 5: RJMCMC model selection when the true model is a BAR(3) with 
a = (0.37,0.4,0.1,0.03)' and <fi = 100. In each graph, the raw output of the 
RJMCMC chain (gray lines) for a £ A&+i in spaces with different dimensions 
(& + 1) with jfc = l,...,6. 
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Figure 6: RJMCMC raw output for <fi (upper chart), the progressive estimates of 
the model probabilities (bottom chart), when the true model is a BAR(3) with 
a = (0.37, 0.4, 0.1, 0.03)' and = 100. 



for other non-Gaussia n autoregressive processe s, such as the integer valued 



ARM A processes (see Enciso-Mora et al. 2009| ). what we found new for the 



Beta autoregressive, is the high dispersion of the model order estimates in small 
samples. 

Both of these results may contribute to explain the shape of the model 
posterior distribution that we will obtain in the applications to real dataset 
presented in Section [6j 



6 Empirical Application 



6.1 Unemployment Data 

The dynamics of the unemployment r ate is certain ly o ne of the mo st studied in 
the economics time series analysis (see iBeanl 1994} and iNickell 19971 ] for an early 
review on unemployment models). Modeling and forecasting the unemployment 
rate still represent some of the most challenging iss ues in econometrics. S ee for 



example iNeftcil [1984} , Montgomery et al.l [1998} and iKoop and Potterl [1999} for 



some recent advances on nonlinear models. The unemployment rate is usually 
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Estimated Model Probabilities 


^\ n 
k 


1 nn 




ouu 


Ann 


^nn 


1 


0.06667 


0.09143 


0.10291 


0.09806 


0.00129 


2 


0.19208 


0.38611 


0.29345 


0.19384 


0.17828 


3 


0.18939 


0.43252 


0.35730 


0.65473 


0.75391 


4 


0.13245 


0.03387 


0.24522 


0.05255 


0.04449 


5 


0.10674 


0.02291 


0.00002 


0.00011 


0.00018 


6 


0.06682 


0.01910 


0.00001 


0.00033 


0.00013 


7 


0.05682 


0.00691 


0.00001 


0.00001 


0.00663 


8 


0.04682 


0.00149 


0.00001 


0.00001 


0.00012 


9 


0.04283 


0.00139 


0.00001 


0.00001 


0.00018 


10 


0.03891 


0.00109 


0.00042 


0.00001 


0.00001 


11 


0.01488 


0.00080 


0.00014 


0.00001 


0.00001 


12 


0.01455 


0.00055 


0.00022 


0.00001 


0.00001 


13 


0.01109 


0.00062 


0.00012 


0.00001 


0.00001 


14 


0.01078 


0.00027 


0.00011 


0.00001 


0.00001 


15 


0.00917 


0.00092 


0.00005 


0.00030 


0.00023 


Mode 


2 


3 


3 


3 


3 


Mean 


4.80121 


2.65058 


2.75522 


2.66801 


2.85226 


s.d. 


3.1253 


1.2256 


0.9854 


0.76153 


0.70101 



Table 3: Relation between sample size n (first row) and model order posterior 
(columns from one to six) for /c max = 15 when data are simulated from a BAR(3) 
with (ex! , (/)) = (0.37,0.4,0.1,0.03,100). We assume a modified truncated- 
Gaussian prior with parameters u — (k + 2)" 1 l, T = 1007. Approximation 
of the model order posterior and of its mode, mean and standard deviation (last 
three rows) is based on 100,000 RJMCMC iterations. 



characterized by relatively brief periods of rapid economic contraction and by 
relatively long periods of slow expansion. In this work we do not model the 
asymmetric behavior of time series but focus instead on another fundamental 
feature of this variable, that is the unemployment rate is naturally defined 
on a bounded interval, let us say the (0, 1) interval. Data transformation is 



usually applied to have data o n the real interval (see IWallisI jl987| ). Recently 



Rocha and Cribari-Netol [20091 ] suggested instead Beta autoregressive models for 



modeling the unemployment rate. 

Here we consider two interesting dataset (see Fig. [7]). The first one is the 
US unemployment rate (source: Datastream) sampled at a monthly frequency 
from February 1971 to December 2009. We are mainly interested in modeling 
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Figure 7: Up: US (solid line) and EU (dashed line) unemployment rates 
at a monthly frequency. Bottom: US capacity utilization rate (solid line), 
its estimated linear trend (dotted line) with intercept 70 = 0.843 and slope 
71 = —0.066 and the detrended capacity rate (dashed line). 



the economic cycle and thus consider deseasonalized data. This dataset is quite 
large (467 observations) when compared to other macroeconomics datas et and 
is one of the most studied set of data in econometrics (see for example iNickel 



1997] ). The other dataset is the deseasonalized unemployment rate of the Euro 
Area sampled at the monthly frequency (source: Datastream), from January 
1995 to December 2009. We will consider the aggregated unempl oyment rate 



for the 13 countries area. This is another well studied dataset (see iBeanl [1994 
and inference on this dataset could be challenging due to the limited amount 
of observations (180 observations). Moreover modeling and forecasting of this 
variable represents one of the most important issues for the European Central 
Bank and for the European institute of official statistics (Eurostat). 

We assume modified Gaussian prior for the autoregressive parameters and 
a uniform prior for the autoregressive order. For the RJMCMC algorithm we 
consider N = 100, 000 iterations and a discard the first 10,000 samples in order to 
obtain an estimate of the parameters and of the model posterior. The estimation 
results are given in Tab. HJ Figures [8] show the model posterior probabilities for 
the two time series. 
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US Unemployment Rate (k = 2) 


o k 








<\> 


ACC 




0.011 


0.547 0.272 




130 


0.392 


Qo.025 


0.006 


0.118 0.007 




126 




Qo.975 


0.017 


0.820 0.708 
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EU Unemployment Rate (k = 5) 


Ok 


«o 


CCl «2 «3 Q!4 «5 







ACC 


0k 


0.029 


0.544 0.076 0.010 0.024 0.011 




128 


0.278 


Qo.025 


0.024 


0.140 0.001 0.004 0.010 0.004 




123 




Qo.975 


0.034 


0.572 0.096 0.016 0.029 0.016 
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US Capacity Utilization Rate (k = 


6) 






Ok 


a 


Oi\ CS-1 «3 «4 «5 


a 6 


<\> 


ACC 


Ok 


0.397 


0.196 0.075 0.065 0.045 0.053 


0.049 


130 


0.398 


Qo.025 


0.390 


0.189 0.058 0.058 0.040 0.049 


0.030 


126 




Qo.975 


0.408 


0.207 0.079 0.074 0.067 0.060 


0.051 


130 





Table 4: Estimated parameters and the model order k for the US (first panel) 
and EU13 (second panel) unemployment rates and for the US capacity utilization 
rate (last panel). In the last column the acceptance rate (ACC) of the trans- 
dimensional jump move in the RJMCMC chain. 



6.2 Capacity Utilization 



We consider the capacity utilization rate. While unemployment rate deals with 
utilization of the labor as a production factor, the capacity utilization deals 
with all production factors (i.e. labor force and stock of capital) of an economic 
system or sector. A detailed definition of capacity utilization and a discussion 
on the different ways to get a statistical measure of this quantity can be found 



in 



Klein and Su 1979 



The cap acity utilization is a relevan t quan tity i n both economic th eory (see 



for example iBurnside and Eichenbaum 



1996 



Klein and Su 



an d iCoolev et al.l [19951 ] ) and in 
" 1979^. In time series analysis 



time series macroeconometrics (see 
both modelling and forecasting of this indicator are challenging issues which 
have a crucial role in the practice of economics analysis. In fact a decreasing 
capacity utilization is usually interpreted as slowdown of the aggregated demand 
and consequently a reduction of the inflation level. An increase of the capacity 
utilization reveals an expansion of the level of economic activity. 

In this work we consider the capacity utilization rate series for the US sampled 
at a monthly frequency from January 1967 to May 2010. The series refers to all 
the industry sectors and is seasonally adjusted (source: Datastream). 
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From a graphical inspection (see the bottom chart of Fig. |H]) we note that 
the series exhibits a negative trend. A deterministic trend could be naturally 
included in the Beta regression model with linear conditional mean by imposing 
some constraints on the slope and the intercept of the linear trend. These 
constraints can be imposed by a suitable specification of the prior distribution. 
However in this work we focus on the autoregressive components thus we follow 
a two steps procedure. 

First we define a normalized linear trend t/T, where T is the sample size, 
and introduce the constrained linear regression model 

x t = lo + lij; + £u with* = 1, . . . ,T (32) 

with 70 G (0, 1) and (71 + 70) G (0, 1). These parameter constraints insure that 
the residuals of the regression are in the (0, 1) interval. In the first step we 
calculate the de-trended capacity utilization rate x t = x t — 71^. The results of 
the trend extraction are given in Fig. [7J 

In the second step we estimate a Beta process on the variable x t . The results 
of the model selection procedure for the Beta process are in Tab. H] and Fig. |HJ 

7 Conclusion 

In this paper we consider Beta autoregressive models of order k for time series 
data defined on a bounded interval. We focused on conditionally linear mean 
processes which are particularly suitable for forecasting purposes and allows for 
an easy interpretation of the parameters in the conditional mean of the process. 
We proposed a Bayesian procedure in order to estimate the parameters of the 
model. We showed that, in this context, the Bayesian approach is suitable 
for dealing with the constraints on the parameters space through a suitable 
specification of the prior distribution. We introduced three different informative 
priors and studied, through some simulation experiments, the effects of the prior 
choice on the inference procedure. 

Moreover, we introduced an efficient RJMCMC algorithm for jointly 
estimating the parameters and selecting the model order. In different sets of 
simulation experiments, the proposed algorithm has been shown to be successful 
in estimating the true parameters for different parameter settings and different 
choices of the prior. We also found that the choice of the prior may have 
effects on the numerical mixing property of the RJMCMC chain by reducing the 
probability of the RJMCMC chain moves near the boundaries of the parameter 
space. In particular, within the three informative priors, the modified truncated 
Gaussian and the Beta-type priors could have positive effects on the mixing of 
the RJMCMC chain. 
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Figure 8: Model probabilities for BAR(k) on the US (upper chart) and EU13 
(middle chart) unemployment rates and on the US capacity utilization rate 
(bottom chart). 
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Through another set of simulation experiments we found that the proposed 
algorithm is able to find the correct order k of the autoregressive model. The 
RJMCMC allowed us to study the effects of the sample size on the model 
posterior. In some cases we found evidence of bias and low efficiency of the 
estimators of the model posterior. Finally we also tested the performance of 
the Bayesian inference procedure and of the RJMCMC algorithm by providing 
two applications to real data. The first one considers the unemployment rates 
of United Stases and Euro Area and the second considers the US capacity 
utilization. 

In this paper we focused on the Beta Autoregressive models with conditional 
linear mean, however the proposed RJMCMC algorithm can be extended to 
be applied to other type of Beta processes. In particular the authors are 
considering the inclusion in the inference process of the order of the moving 
average components and the extension to the Beta ARMA processes with 
nonlinear conditional mean. Both of the extensions can be considered with 
and without the inclusion of other explanatory variables. 
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Appendix A 

When a new reciprocal root r is proposed in the reversible jump MCMC 
procedure, the relationship between the autoregressive coefficients of the BAR(fc) 
and of the BAR(/c + 1) is 

\-a\L- a* k+1 L k+1 = [l - (1 - rL)(l - a x L - . . . - a k L k )] (33) 

Identification terms by terms gives the following recursion 

a\ = cti + r (34) 
a* = aj — rctj-i, for j = 2, . . . , k + 1 (35) 

with = 0. 

The constant term and the autoregressive coefficients belong to the standard 
simplex, that is 

k 

5^a!j<l, ai > 0, fori = 0,1,...,* (36) 

for each order k of the Beta process. Thus if the parameters of the BAK(k) 
belong to the standard simplex, the following condition 

fc+l k 

a *o + a *j - 1 ^ a ° + 1 1 - r ) a i + r - 1 ( 3? ) 

j=i i=i 
and the positivity constraints for a*, j = 2, . . . , k 

(«i + r) > 0, (aj - raj-i) > 0, -ra fc > (38) 

are satisfied provided that 

a* = a 

and for —a.\ < r < 0. 

Appendix B 

For the Gaussian prior we have 

V (1) log/(a) = -T _1 (a-i/) (39) 
V (2) log/(a) = -T- 1 (40) 
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For the modified Gaussian prior we have 

V«log/(a) = -r-\ a -v) + vf *__ )2 A (41) 

V<*>lo g /(a) = -^^^-^D-^-^M (42) 

where D = Le\ + e^', d = (e x - Da)' G M fe+1 , v = (1, . . . , 1)' G M fe+1 and 
e x = (1,0,..., 0)' G R fc+1 . 

For the transformed Beta prior the h-th element of log f(ac) is 



^ log /(«) = + ^ ^ ^ ^— (43) 

i=/i+l i=/i 



i+1 



and the (/i, Z)-th element of log f(oz), with I > h, is 

d ahai log /(a) = — — - 2^ + 2^ "^T" ( 44 ) 

h i=i+i 1 i=i 1+1 
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